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Abstract 

Suppose X is any exactly fc-sparse vector in R". We present a class of "sparse" matrices A, and a corresponding 
algorithm that we call SHO-FA (for Short and FasQ that, with high probability over A, can reconstruct x from ^x. 
The SHO-FA algorithm is related to the Invertible Bloom Lookup Tables (IBLTs) recently introduced by Goodrich 
et al., with two important distinctions - SHO-FA relies on linear measurements, and is robust to noise. The SHO- 
FA algorithm is the first to simultaneously have the following properties: (a) it requires only 0{k) measurements, 
(b) the bit-precision of each measurement and each arithmetic operation is O (log(n) + P) (here 2^^ corresponds 
to the desired relative error in the reconstruction of x), (c) the computational complexity of decoding is 0{k) 
arithmetic operations, and (d) if the reconstruction goal is simply to recover a single component of x instead of all 
of X, with high probability over A this can be done in constant time. All constants above are independent of all 
problem parameters other than the desired probability of success. For a wide range of parameters these properties are 
information-theoretically order-optimal. In addition, our SHO-FA algorithm is robust to random noise, and (random) 
approximate sparsity for a large range of k. In particular, suppose the measured vector equals A{x + z) +e, where z 
and e correspond respectively to the source tail and measurement noise. Under reasonable statistical assumptions on 
z and e our decoding algorithm reconstructs x with an estimation error of Odlzjli + (logfc)^||e||i). The SHO-FA 
algorithm works with high probability over A, z, and e, and still requires only 0{k) steps and 0{k) measurements 
over C'(log(?T,))-bit numbers. This is in contrast to most existing algorithms which focus on the "worst-case" z 
model, where it is known Q,{k login / k)) measurements over O (log (n)) -bit numbers are necessary. Our algorithm 
has good empirical performance, as validated by simulations. 



I. Introduction 

In recent years, spurred by the seminal work on compressive sensing of |[T|, Q, much attention has focused on 
the problem of reconstructing a length-n "compressible" vector x over R with fewer than n linear measurements. 
In particular, it is known {e.g. ||4|) that with m = 0{k\og{n/k)) linear measurements one can computationally 
efficientl}|^ obtain a vector x such that the reconstruction error ||x — x||i is Odlx — Xfc*||i)j^ where x^* is the 
best possible fc-sparse approximation to x (specifically, the k non-zero terms of x^* correspond to the k largest 
components of x in magnitude, hence x — x^* corresponds to the "tail" of x). A number of different classes of 
algorithms are able to give such performance, such as those based on 4 -optimization {e.g. ||T|, Q), and those based 
on iterative "matching pursuit" {e.g. [8], [9]). Similar results, with an additional additive term in the reconstruction 
error hold even if the linear measurements themselves also have noise added to them {e.g. ||3|, Q). The fastest of 



these algorithms use ideas from the theory of expander graphs, and have running time 0{n\og{n/k)) |10|-|12|. 

The class of results summarized above are indeed very strong - they hold for all x vectors, including those with 
"worst-case tails", i.e. even vectors where the components of x smaller than the k largest coefficients (which can 
be thought of as "source tail") are chosen in a maximally worst-case manner. In fact p3| proves that to obtain 
a reconstruction error that scales linearly with the 4-norm of the z, the tail of x, requires Q.{k\og{n/k)) linear 
measurements. 

'Also, SHO-FA sho good! In fact, it's all 0{k)\ 

^The caveat is that the reconstruction techniques require one to solve an LP. Though polynomial-time algorithms to solve LPs are known, 
they are generally considered to be impractical for large problem instances. 

^In fact this is the so-called 4 < Ck guarantee. One can also prove stronger < Clxj^fk reconstruction guarantees for algorithms with 
similar computational performance, and it is known that a < Ch reconstruction guarantee is not possible if the algorithm is required to 
be zero-error |5 |, but is possible if some(small) probability of error is allowed Q. 
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Number of measurements: However, depending on the application, such a lower bound based on "worst-case z" 
may be unduly pessimistic. For instance, it is known that if x is exactly fc-sparse (has exactly exactly k non-zero 
components, and hence z = 0), then based on Reed-Solomon codes [14] one can efficiently reconstruct x with 
0{k) noiseless measurements {e.g. |15| ) via algorithms with decoding time-complexity 0{nlog{n)), or via codes 
such as in |16|, 1 17| with 0{k) noiseless measurements with decoding time-complexity 0(n)|^ln the regime where 
k = 6{n) 1 18 1 use the "sparse-matrix" techniques of 1 10|-p2) to demonstrate that 0{k) = 0{n) measurements 



suffice to reconstruct x. 

Noise: Even if the source is not exactly fe-sparse, a spate of recent work has taken a more information-theoretic view 
than the coding-theoretic/worst-case point-of-view espoused by much of the compressive sensing work thus far. 
Specifically, suppose the length-n source vector is the sum of any exactly A;-sparse vector x and a "random" source 
noise vector z (and possibly the linear measurement vector A(x + z) also has a "random" measurement noise vector 
e added to it). Then as long as the noise variances are not "too much larger" than the signal power, the work of p9) 



demonstrates that 0{k) measurements suffice (though the proofs in |19| are information-theoretic and existential - 
the corresponding "typical-set decoding" algorithms require time exponential in n). Indeed, even the work of | [T3} , 
whose primary focus was to prove that Q.{k\og{n/k)) linear measurements are necessary to reconstruct x in the 
worst case, also notes as an aside that if x corresponds to an exactly sparse vector plus random noise, then in 



fact 0{k) measurements suffice. The work in |20|, |21| examines this phenomenon information-theoretically by 



drawing a nice connection with the Renyi information dimension d{X) of the signal/noise, and [22 1 show how to 
computationally efficiently achieve this performance by exactly reconstructing x with 0{d{X)n) + o(n) samples in 
time 0{n). Corresponding lower bounds showing Q.{k\og{n/k)) samples are required in the higher noise regime 



are provided in |23|, |24|. 



Number of measurement bits: However, most of the works above focus on minimizing the number of linear 
measurements in ylx, rather than the more information-theoretic view of trying to minimize the number of bits 
in Ax over all measurements. Some recent work attempts to fill this gap - notably "Counting Braids" ||25|, 



p6) (this work uses "multi-layered non-linear measurements"), and "one-bit compressive sensing" p7| , p8| (the 
corresponding decoding complexity is somewhat high (though still polynomial-time) since it involves solving an 
LP). 

Decoding time-complexity: The emphasis of the discussion thus far has been on the number of linear measure- 
ments/bits required to reconstruct x. The decoding algorithms in most of the works above have decoding time- 
complexitie^ that scale at least linearly with n. In regimes where k is significantly smaller than n, it is natural to 
wonder whether one can do better. Indeed, algorithms based on iterative techniques answer this in the affirmative. 
These include Chaining Pursuit [29], group-testing based algorithms [ [30[ , and Sudocodes [ |3T| - each of these have 
decoding time-complexity that can be sub-Unear in n (but at least ©(A; log(A;) log(n))), but each requires at least 
0{k\og{n)) linear measurements. 

Database query: Finally, we consider a database query property that is not often of primary concern in the 
compressive sensing literature. That is, suppose one is given a compressive sensing algorithm that is capable of 
reconstructing x with the desired reconstruction guarantee. Now suppose that one instead wishes to reconstruct, 
with reasonably high probability, just "a few" (constant number) specific components of x, rather than all of it. 
Is it possible to do so even faster (say in constant time) - for instance, if the measurements are in a database, 
and one wishes to query it in a computationally efficient manner? If the matrix A is "dense" (most of its entries 
are non-zero) then one can directly see that this is impossible. However, several compressive sensing algorithms 
(for instance [18]) are based on "sparse" matrices A, and it can be shown that in fact these algorithms do indeed 
have this property "for free" (as indeed does our algorithm), even though the authors do not analyze this. As can 
be inferred from the name, this database query property is more often considered in the database community, for 
instance in the work on IBLTs {32} . 

''in general the linear systems produced by Reed-Solomon codes are ill-conditioned, which causes problems for large n. 

^For ease of presentation, in accordance with common practice in the literature, in this discussion we assume that the time-complexity of 
performing a single arithmetic operation is constant. Explicitly taking the complexity of performing finite-precision arithmetic into account 
adds a multiplicative factor (corresponding to the precision with which arithmetic operations are performed) in the time-complexity of most 
of the works, including ours. 
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A. Our contributions 

Conceptually, the "iterative decoding" technique we use is not new. Similar ideas have been used in various 
settings in, for instance |16| , ||32|-|j34|. However, to the best of our knowledge, no prior work has the same 
performance as our work - namely - information-theoretically order-optimal number of measurements, bits in 
those measurements, and time-complexity, for the problem of reconstructing a sparse signal (or sparse signal with 
a noisy tail and noisy measurements) via linear measurements (along with the database query property)]^ The key 
to this performance is our novel design of "sparse random" linear measurements, as described in Section [11] 

To summarize, the desirable properties of SHO-FA are that with high probabilit}]^ 

• Number of measurements: For every fc-sparse x, with high probability over A, 0{k) Linear measurements 
suffice to reconstruct x. This is information-theoretically order-optimal. 

• Number of measurement bits: The total number of bits in Ax. required to reconstruct x to a relative error 
of 2^^ is 0{k{log{n) + P)). This is information-theoretically order-optimal for any k = 0{n^~^) (for any 
A > 0). 

• Decoding time-complexity: The total number of arithmetic operations required is 0{k). This is information- 
theoretically order-optimal. 

• Database queries: With constant probability 1 — e any single database query can be answered in constant 
time. 

• Noise: Suppose z and e have i.i.d. component^ drawn respectively from AA(0,(T^) and M{0,a'^). For k = 
0{n^^'^) for any A > 0, a modified version of SHO-FA (mod-SHO-FA) that with high probability reconstructs 
X with an estimation error of 0(||z||i + (log A;)^||e||i). 

• Practicality: As validated by simulations (shown in Appendix |l]), most of the constant factors involved above 
are not large, and are in fact significantly smaller than the explicit constants that can be calculated via our 
analysis. 

• Different bases: As is common in the compressive sensing literature, our techniques generalize directly to the 
setting wherein x is sparse in an alternative basis (say, for example, in a wavelet basis). We defer discussion 
of this until after the description of SHO-FA. 

• Universality: While we present a specific ensemble of matrices over which SHO-FA operates, we argue that in 
fact similar algorithms work over fairly general ensembles of "sparse random matrices", and further that such 



matrices can occur in applications, for instance in wireless MIMO systems |38|. Again, we defer discussion 
of this issue to after our description of SHO-FA. 



B. Special acknowledgements 

In particular, the bounds on the minimum number of measurements required for "worst-case" recovery and the 
corresponding discussion on recovery of signals with "random tails" in \13\ led us to consider this problem in the 



first place. Equally, the class of compressive sensing codes in |18|, which in turn build upon the constructions of 



expander codes in |33| , have been influential in leading us to this work. While the model in 1 34 1 differs from the 



one in this work, the techniques therein are of significant interest in our work. The analysis in of the number of 
disjoint components in certain classes of random graphs, and also the analysis of how noise propagates in iterative 



decoding is potentially useful sharpening our results. We elaborate on these in Section III 



'while writing this paper, we became aware of a parallel work by Pawar and Ramchandran |35| that seems to rely on ideas similar to our 
work and achieves similar performance guarantees. However, at the time of submission, a preprint of the work by Pawar and Ramchandran 
is not available for us to compare and contrast the two results. 

^For most of the properties, we show that this probability is at least 1 — l/k'~'^^\ though we explicitly prove only 1 — 0{l/k). 

^The constant e can be made arbitrarily close to zero, at the cost of a multiplicative factor C(l/e) in the number of measurements required. 
In fact, if we allow the number of measurements to scale as C'(fclog(fc)), we can support any number of database queries, each in constant 
time, with probability of every one being answered correctly at with probability at least 1 — e. 

'Even if the statistical distribution of the components of z and e are not i.i.d. Gaussian, statements with a similar flavor can be made. 
For instance, pertaining to the effect of the distribution of z, it turns out that our analysis is sensitive only on the distribution of the sum of 
components of z, rather then the components themselves. Hence, for example, if the components of z are i.i.d. non-Gaussian, it turns out 
that via the Berry-Esseen theorem 1 36] one can derive similar results to the ones derived in this work. In another direction, if the components 
of z are not i.i.d. but do satisfy some "regularity constraints", then using Bernstein's inequality |37 | one can again derive analogous results. 
However, these arguments are more sensitive and outside the scope of this paper, where the focus is on simpler models. 
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Explanations and discussion: At the risk of missing much of the literature, and also perhaps oversimplifying nuanced 
results, we summarize in this table many of the strands of work preceding this paper and related to it - not all results from 
each work are represented in this table. The second to the fifth columns respectively reference whether the measurement 
matrix A, source fc-sparse vector x, source noise z, and measurement noise e are random (R) or deterministic (D) - 
a in a column corresponding to noise indicates that that work did not consider that type of noise. An entry "P.L." 
stands for "Power Law ' decay in columns corresponding to x and z. For achievability schemes, in general Z3-type 
results are stronger than i?-type results, which in turn are stronger than 0-type results. This is because a _D-type result 
for the measurement matrix indicates that there is an explicit construction of a matrix that satisfies the required goals, 
whereas the i?-type results generally indicate that the result is true with high probability over measurement matrices. 
Analogously, a _D in the columns corresponding to x, z or e indicates that the scheme is true for all vectors, whereas 
an R mdicates that it is true for random vectors from some suitable ensemble. For converse results, the the opposite is 
true results are stronger than i?-type results, which are stronger than I?-type results. An entry JV indicates the normal 
distribution - the results of |24| ana |23| are converses for matrices with i.Ld. Gaussian entries. An entry "sgn" denotes 
(in the case of works dealing with one-bit measurements) that the errors are sign errors. The sixth column corresponds 
to what the desired goal is. The strongest possible goal is to have exact reconstruction of x (up to quantization error 
due to finite-precision artihmetic), but this is not always possible, especially in the presence of noise. Other possible 
goals include "Sup. Rec. " (short for support recovery) of x, or that the reconstruction x of x differs from x as a 
"small" function of z. It is known that if a deterministic reconstruction algorithm is desired to work for all x and z, 
then ||x — x||2 < C'(||z||2) is not possible with less than il(rt) measurements |5|, and that ||x — x||2 < OdlzHi/v^) 
implies ||x — x||i < ©(HzHi). The reconstruction guarantees in |27|, |]28| unfortunately do not fall neatly in these 
categories. The seventh column indicates what the probability of error is - i.e. the probability over any randomness in 
A, X, z and e that the reconstruction goal in the sixth column is not met. In the eighth column, some entries are marked 
d{x + z) - this denotes the (upper) Renyi dimension of x + z - in the case of exactly fc-sparse vectors this equals fc, 
but for non-zero z it depends on the distribution of z. The ninth column considers the computational complexity of the 
algorithms - the entry denotes the computational complexity of solving a linear program. The final column notes 
whether the particular work referenced considers the precision of arithmetic operations, and if so, to what level. 



The work that is conceptually the closest to SHO-FA is that of the Invertible Bloom Lookup Tables (IBLTs) 
introduced by Goodrich-Mitzenmacher ||32| (though our results were derived independently, and hence much of 
our analysis follows a different line of reasoning). The data structures and iterative decoding procedure (called 
"peeling" in f32]) used are structurally very similar to the ones used in this work. However the "measurements" 
in IBLTs aie fundamentally non-linear in nature - specifically, each measurement includes within it a "counter" 
variable - it is not obvious how to implement this in a linear manner. Therefore, though the underlying graphical 
structure of our algorithms is similar, the details of our implementation require new non-trivial ideas. Also, IBLTs 
as described are not robust to either signal tails or measurement noise. Nonetheless, the ideas in |32| have been 
influential in this work. In particular, the notion that an individual component of x could be recovered in constant 
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time, a common feature of Bloom filters, came to our notice due to this work. Also, the analysis (via "2-cores of 
random hypergraphs") in 1 32 1 of the leading constant factor in the number of measurements m is tighter than that 
presented in our work. Since the emphasis of our work is on order optimality, we choose to present our (fairly 
straightforward) analysis, and leave for future work the incorporation of their more technical line of reasoning into 
our algorithm. 

II. Exactly A;-sparse x and noiseless measurements 

We first consider the simpler case when the source signal is exactly A;-sparse and the measurements are noiseless, 
i.e., y = Ax, and both z and e are all- zero vectors. The intuition presented here carries over to the scenario wherein 



both z and e are non-zero, considered separately in Section III 

For /c-sparse input vectors x G M" let the set 5(x) denote its support, i.e., its set of nonzero values {j : Xj ^ 0}. 
Recall that in our notation, for some m, a measurement matrix A G M"^^" is chosen probabilistically. This matrix 
operates on x to yield the measurement vector y G as y = Ax.. The decoder takes the vector y as input and 
outputs the reconstruction x G - it is desired that x equal x (with upto P bits of precision) with high probability 
over the choice of measurement matrices . 

In this section, we describe a probabilistic construction of the measurement matrix A and a reconstruction 
algorithm SHO-FA that achieves the following guarantees. 

Theorem 1. Let k < n. There exists a reconstruction algorithm SHO-FA for A G R™^" with the following 
properties: 

1) For every x G M", with probability 1 — 0{l/k) over the choice of A, SHO-FA produces a reconstruction x 
such that ||x — x||i/||x||i < 

2) m = ck for some c > 

3) Expected number of steps required by SHO-FA is 0{k) 

4) Expected number of bitwise arithmetic operations required by SHO-FA is 0{k(\ogn + P)). 
We prove the above theorem in the remainder of this section. 

A. High-level intuition 

If m = 0(n), the task of reconstructing x from y = Ax appears similar to that of syndrome decoding of a 
channel code of rate n/m [44J . It is well-known | ,45J that channel codes based on bipartite expander graphs, i.e., 
bipartite graphs with good expansion guarantees for all sets of size less than or equal to k, allow for decoding in 
a number of steps that is linear in the size of x. In particular, given such a bipartite expander graph with n nodes 
on the left and m nodes on the right, choosing the matrix A as a m x n binary matrix with non-zero values in the 
locations where the corresponding pair of nodes in the graph has an edge is known to result in codes with rate and 
relative minimum distance that is linear in n. 

Motivated by this [18] explore a measurement design that is derived from expander graphs and show that 
0{k\og{n/k)) measurements suffice, and 0{k) iterations with overall decoding complexity of ©(n log 

It is tempting to think that perhaps an optimized application of expander graphs could result in a design that 
require only 0{k) number of measurements. However, we show that in the compressive sensing setting, where, 
typically k = o{n), it is not possible to satisfy the desired expansion properties. In particular, if one tries to mimic 
the approach of |[T8|, one would need bipartite expanders such that all sets of size k on one side of the graph 
"expand" - we show in Lemma [2] that this is not possible. As such, this result may be of independent interest for 



other work that require similar graphical constructions (for instance the "magical graph" constructions of |46|, or 
the expander code constructions of p3| in the high-rate regime). 

Instead, one of our key ideas is that we do not really need "true" expansion. Instead, we rely on a notion of 
approximate expansion that guarantees expansion for most fc-sized sets (and their subsets) of nodes on the left of 
our bipartite graph. We do so by showing that any set of size at most k, with high probability over suitably chosen 

'"The work of 1 10] is related - it also relies on bipartite expander graphs, and has similar performance for exactly fc-sparse vectors. 
But fTO] can also handle general approximately fc-sparse vectors, unlike flS) . However, our algorithms are closer in spirit to those of (18| , 
and hence we focus on this work. 
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measurement matrices, expands to the desired amount. Probabilistic constructions turn out to exist for our desired 
property!^ Such a construction is shown in Lemma [l] 

Our second key idea is that in order to be able to recover all the k non-zero components of x with at most 
0{k) steps in the decoding algorithm, it is necessary (and sufficient) that on average, the decoder reconstructs one 
previously undecoded non-zero component of x, say xj, in 0{l) steps in the decoding algorithm. For k = o{n) 
the algorithm does not even have enough time to write out all of x, but only its non-zero values. To achieve such 
efficient identification of Xj, we go beyond the 0/1 matrices used in almost all prior work on compressive sensing 
based on expander graphs {^Instead, we use distinct values in each row for the non-zero values in A, so that if only 
one non-zero Xj is involved in the linear measurement involving a particular yi (a situation that we demonstrate 
happens in a constant fraction of yi), one can identify which xj it must be in 0(1) time. Our decoding then proceeds 
iteratively, by identifying such Xj and canceling their effects on yi, and terminates after 0{k) steps after all non-zero 
Xj and their locations have been identified (since we require our algorithm to work with high probability for all 
X, we also add "verification" measurements - this only increases the total number of measurements by a constant 
factor). Our calculations are precise to 0(log(n) + P) bits - the first term in this comes from requirements necessary 
for computationally efficient identification of non-zero xj, and the last term from the requirement that we require 
that the reconstructed vector be correct up to P-precision. Hence the total number of bits over all measurements is 
0(A;((log(n) + P)). Note that this is information-theoretically order-optimal, since even specifying k locations in a 
length-n vector requires Q,{k{log{n/k)) bits, and specifying the value of the non-zero locations so that the relative 
reconstruction error is 0{2^^) requires Q{kP) bits. 

We now present our SHO-FA algorithm in two stages. We first by use our first key idea (of "approximate") 
expansion in Section to describe some properties of bipartite expander graphs with certain parameters. We then 
show in Section how these properties, via our second key idea (of efficient identification) can be used by SHO-FA 
to obtain desirable performance. 

B. Description of graph properties 

We first construct a bipartite graph Q (see Example 1 in the Appendix) with some desirable properties outlined 
below. We then show in Lemmas [T] and [3] that such graphs exist (Lemma |2] shows the non-existence of graphs with 
even stronger properties). In Section [Tl-CI we then use these graph properties in the SHO-FA algorithm. To simplify 
notation in what follows (unless otherwise specified) we omit rounding numbers resulting from taking ratios or 
logarithms, with the understanding that the corresponding inaccuracy introduced is negligible compared to the result 
of the computation. Also, for ease of exposition, we fix various internal parameters to "reasonable" values rather 
than optimizing them to obtain "slightly" better performance at the cost of obfuscating the explanations - whenever 
this happens we shall point it out parenthetically. Lastly, let e be any "small" positive number, corresponding to 
the probability of a certain "bad event". 
Properties of Q: 

1) Construction of a left-regular bipartite graph: The graph Q is chosen uniformly at random from the set of 
bipartite graphs with n nodes on the left and m' nodes on the right, such that each node on the left has degree 
d > 7. In particular, m' is chosen to equal ck for some design parameter c to be specified later as part of code 
design. 

2) Edge weights for "identifiability" : For each node on the right, the weights of the edges attached to it are 
required to be distinct. In particular, each edge weight is chosen as a complex number of unit magnitude, 
and phase between and 7r/2. Since there are a total of dn edges in Q, choosing distinct phases for each 
edge attached to a node on the right requires at most \og{dn) bits of precision (though on average there are 
about dnjvn! edges attached to a node on the right, and hence on average one needs about Xogidnjvn!) bits 
of precision). 



3) S{^) -expansion: With high probability over Q defined in Property [T] above, for any set 5(x) of k nodes on 



the left, the number of nodes neighbouring those in any 5'(x) C o(x) is required to be at least 2d/3 times 

"in fact similar properties have been considered before in the literature - for instance |46) constructed so-called "magical graphs" with 
similar properties. Our contribution is the way we use this property for our needs. 

'^It can be argued that such a choice is a historical artifact, since error-correcting codes based on expanders were originally designed to 
work over the binary field F2. There is no reason to stick to this convention when, as now, computations are done over R. 
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the size of 5'(x){^The proof of this statement is the subject of Lemma [T] 
4) "Many" S{x.)-leaf nodes: For any set 5(x) of at most k nodes on the left of Q, we call any node on the right 
of Q an S{'x) -leaf node if it has exactly one neighbor in 5(x), and we call it a S{^) -non-leaf node if it has 
two or more neighbours in 5(x). (If the node on the right has no neighbours in 5(x), we call it a 5(x)-zero 
node.) Assuming 5(x) satisfies the expansion condition in Property [5] above, it can be shown that at least a 
fraction 1/2 of the nodes that are neighbours of any 5'(x) C 5(x) are 5'(x)-leaf nodes The proof of this 
statement is the subject of Lemma [T] 

Example 1: We now demonstrate via the following toy example in Figures II-B and |2] a graph Q satisfying 

Properties [T]|4j 




1 

gj7r/6 
giir/3 



Figure 1. Property [7] Bipartite "approximate expander" graph witii n = 5 nodes on the left, and m' = 4 nodes on the right. Each node on 
the left has degree 3. Property [2| The thicknesses of the edges represent the weights assigned to the edges. In particular, it is required that 
for each node on the right, the edges incoming have distinct weights. In this example, the thinnest edges are assigned a weight of 1, the next 
thickest edges have a weight e''^^'^ , the next thickest edges have weight e'''^^^^ — e''^^'^ , and the thickest edges have weight e''^^^^ = e"'^^. 




6' n vn ica I" node 



Figure 2. Property [j| We require that most sets 5'(x) of at most |5(x)| — k = 2 nodes on the left in the graph Q in Figure 



least 2|5'(x)| neighbors on the right. ) In the graph in Figure 
at least 2j5'(x)j neighbors. For example. Figure 



II-B 



II-B 



II-B 



it can be manually verified that most sets of size 5'(x) at most 



have at 
2 have 



a) focuses on the subset 5'(x) = {1, 5} of nodes on the left side of Q in Figure 



II-B 



This particular iS'(x)has 4 neighbours, and all its si ngle-n ode subsets have 3 neighbours. The only 5'(x) set of two or fewer nodes that does 



not satisfy Property^ is {2,5}, as shown in Figure 



II-B 



b), since it has only 3 < 2 x 2 neighbours. Property [?| For sets 5'(x) that satisfy 
Property [3] it can be manually verified that "many" ot ttieir neighbours are 5'(x)-leaf nodes. For example, for 5'(x) = {1, 5}, two out of 
its four neighbours (i.e., a fraction 1/2) are 5'(x)-leaf nodes ~ which satisfies the constraint that at least a fraction 1/2 of its neighbours be 
5'(x)-leaf nodes. On the other hand, for 5'(x) — {2, 5} (which does not satisfy Property |3|, none of its neighbours are 5'(x)-leaf nodes. 



□ 

We now state the Lemmas needed to make our arguments precise. First, we formalize the 5' (x) -expansion 
property defined in Property [3] 

Lemma 1. (Property [3] (5(x)-expansion)): Let e > and k < n € K be arbitrary, and let c G 'N be fixed. Let 
Q be chosen uniformly at random from the set of all bipartite graphs with n nodes (each of degree d) on the left 

'^^The expansion factor 2d/3 is somewhat arbitrary. In our proofs, this can be replaced with any number strictly between half the degree and 
the degree of the left nodes, and indeed one can carefully optimize over such choices so as to improve the constant in front of the expected 
time-complexity/number of measurements of SHO-FA. Again, we omit this optimization since this can only improve the performance of 
SHO-FA by a constant factor. 

'"*Yet again, this choice of 1/2 is a function of the choices made for the degree of the left nodes in Property [T| and the expansion factor 
2 in Property [3] Again, we omit optimizing it. 
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and m! = ck nodes on the right. Then for any 5(x) of size at most k and any S'{x) C 5(x), with probability 
1 — o(l/fe) (over the random choice Q) there are at least 2d/3 times as many nodes neighbouring those in 5'(x), 
as there are in S'{x). 

Proof: Follows from a standard probabilistic method argument. Given for completeness in Appendix [A] 

Note here that, in contrast to the "usual" definition of "vertex expansion" |45| (wherein the expansion property 
is desired "for all" subsets of left nodes up to a certain size) Lemma [T] above only gives a probabilistic expansion 
guarantee for any subset of 5(x) of size k. In fact, Lemma |2] below shows that for the parameters of interest, "for 
air'-type expanders cannot exist. 

Lemma 2. Let k = o{n), and d > be an arbitrary constant. Let Q be an arbitrary bipartite graph with n nodes 
(each of degree d) on the left and m' nodes on the right. Then for all sufficiently large n, suppose each set of of 
size k ofS{x) nodes on the left of Q has strictly more than d/2 times as many nodes neighbouring those in 5(x), 
as there are in 5(x). Then m! = ^[k\og{n/k)). 



Proof Follows from the Hamming bound in coding theory |44| and standard techniques for expander codes |33|. 
Proof in Appendix [B] 

Another way of thinking about Lemma |2] is that it indicates that if one wants a "for all" guarantee on expansion, 
then one has to return to the regime of m' = 0{k\og{n/k)) measurements, as in "usual" compressive sensing. 

Next, we formalize the "many 5(x)-leaf nodes" property defined in Property |4] Recall that for any set 5(x) 
of at most k nodes on the left of C/, we call any node on the right of Q an S{x) -leaf node if it has exactly one 
neighbor in 5(x). 

Lemma 3. Let 5(x) be a set of k nodes on the left of Q such that the number of nodes neighbouring those in any 
S'{x) C 5(x) is at least 2d/3 times the size ofS'{x). Then at least a fraction 1/2 of the nodes that are neighbours 
of any S'{x) C S{x) are S' {x) -leaf nodes. 

Proof Based on Lemma [T] Follows from a counting argument similar to those used in expander codes | [33| . Proof 
in Appendix |C] 

C. Description of SHO-FA 

Given a graph Q satisfying properties [T]|4j we now describe our encoding and decoding procedure. 
Measurement matrix : 

Matrix structure and entries: The encoder's measurement matrix A is chosen based on the structure of Q (recall 
that Q has n nodes on the left and m' nodes on the right). To begin with, the matrix A has m = 2m' rows, and its 
non-zero values are unit-norm complex numbers. This choice of using complex numbers rather than real numbers 
in A is for notational convenience only. One can equally well choose a matrix A' with m = Am' rows, and replace 
each row of A with two consecutive rows in A' comprising respectively of the real and imaginary parts of rows 
of A. Since the components of x are real numbers, hence there is a bijection between Ax and A'x - indeed, 
consecutive pairs of elements in A'x are respectively the real and imaginary parts of the complex components of 
Ax. Also, as we shall see, the choice of unit-norm complex numbers ensures that "noise" due to finite precision 
arithmetic does not get "amplified". 

In particular, corresponding to node i on the right-hand side of Q, the matrix A has two rows. The j*^ entries 
of the {2i - 1)*^ and 2i*^ rows of A are respectively denoted a\^j and a-^^ respectively. (The superscripts (/) and 
(V) respectively stand for Identification and Verification, for reasons that shall become clearer when we discuss 
the process to reconstruct x.) 

Identification entries: If Q has no edge connecting node j on the left with i on the right, then the identification 
entry ai j is set to equal 0. Else, if there is indeed such an edge, a^p- is set to equal 

(Here l denotes the positive square root of —1.) This entry a^^ can also be thought of as the weight of the 

edge in Q connecting j on the left with i on the right. In particular, the phase jiT/{2n) of a\^j = e'-^'^l^'^^) will be 
critical for our algorithm. As in Property |2] in Section |II-B[ our choice above guarantees distinct weights for all 
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edges connected to a node i on the right. 



Verification entries: Whenever the identification entry a^^ equals 0, we choose to set the corresponding verification 



entry a^^-^ also to be zero. On the other hand, whenever ai- / 0, then we set 
uniformly at random from [0,7r/2] (with 0(log(A;)) bits of precision)}*^ 
Example 2: The matrix A corresponding to the graph Q in Example 1 is show in Figure [3] 



(I) 



to equal e '-^ 



for O^^J chosen 



A = 





u 









^jt^l , 1 


u 











^/.7r/6 




















plTV-G 


pf,7r/6 




D 










U 



























Figure 3. This 8x5 matrix denotes the A corresponding to the graph Q. Note that its primary purpose is expository - clearly, 8 
measurements (or indeed, 16 measurements over R) to reconstruct a 2-sparse vector of length 5 is too many! Nonetheless, this is just an 
artifact of the fact that n in this example is small. In fact, according to our proofs, even as n scales to infinity, the number of measurements 
required to reconstruct a 2-sparse vector (or in general a fc-sparse vector for constant k) remains constant! Also, note that we do not use the 
assignment for the identification entries a\'^- specified in Jll, since doing so would result in ugly and not very illuminating calculations in 
Example 3 below. However, as noted in Remark 1, this is not critical - it is sufficient that distinct entries in the identification rows of the 
matrix be distinct. 



D. Reconstruction 

Since the measurement matrix A has interspersed identification and verification rows, this induces corresponding 
interspersed identification observations y^^^ and verification verifications observations y^^^ in the observation vector 
y = ylx. Let y(^) = denote the length-m identification vector over C, and y^^^ = {y-^^} denote the length-m 

verification vector over C. 

Given the measurement matrix A and the observed (y*^^-*, y*^^^) identification and verification vectors, the 
decoder's task is to find any fc-sparse vector x such that results in the corresponding identification and observation 
vectors. We shall argue below that if we succeed, then with high probability over A (specifically, over the verification 
entries of A), this x must equal x. 

To find such a x we design an iterative decoding scheme. This scheme starts by setting the initial guess for the 
reconstruction vector x to the all-zero vector, and also identifies the set of non-zero indices of y^^^. It initializes 
the neighborly set as the set of non-zero indices of the verification vector y^^\ and initializes the gap vector 
as the values of the observation vector y restricted to the neighbourly set. In the first iteration it then picks a 
uniformly random index i from the neighbourly set. Next, the decoder attempts to recover the signal value at some 
index j G 5(x) by looking at and "estimating" which j on the left of Q could have "caused the identification 
observation y^^^". If index i is not a 5 (x) -leaf node, the decoder does not succeed in reconstructing Xj, it declares 
the iteration as a failure, and starts the second iteration by again choosing a new uniformly random index i from the 
neighbourly set. On the other hand, if index z is a 5(x)-leaf node, a signal value Xj will indeed be recovered (and 
"verified" using the verification entry a^-^^ and the verification observation J/j-^^j^^f then the algorithm will update 
the gap vectors by subtracting the "contribution" of the coordinate xj to the measurements it influences (there are 

'^This choice of precision for the verification entries contributes one term to our expression for the precision of arithmetic required. As 
we argue later in Section [II- K| this choice of precision guarantees that if a single identification step returns a value for Xj, this is indeed 
correct with probability 1 — o(l/k). Taking a union bound over 0{k) indices corresponding to non-zero Xj gives us an overall 1 — o(l) 
probability of success. 

'*As Ronald W. Reagan liked to remind us, "doveryai, no proveryai". 
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exactly three of them since the degree of the nodes on the left side of Q is 3), remove i from the neighborly set, and 
finally pick a new random index i from the neighbourly set for the next (second) iteration. The decoder performs 
the above operations repeatedly until x has been completely recovered. We also show that (with high probability 
over ^4) in 0{k) steps this process does indeed terminate. 

Example 3.' Figures |4]-[8] show a sample decoding process for the matrix A as in Example 2, and the observed 
vector y shown in the figures. The example also demonstrates each of several possible scenarios the algorithm can 
find itself in, and how it deals with them. 

E. Formal description of SHO-FA 's reconstruction process 

Our algorithm proceeds iteratively, and has 0{k) overall (expected) number of iterations, with t being the variable 
indexing the iteration number. 

1) We initialize by setting the signal estimate vector x(l) to the all-zeros vector 0", and the residual measurement 
identification/verification vectors y*^^^(l) and y(^)(l) to the decoder's observations y(^) and y*^^^. Let 1^(1), 
the initial neighborly set, be the set of indices i corresponding to non-zero locations of the initial verification 
vector y^^-*, i.e., the set {i < m : ^^-^^(1) 7^ 0}. This step alone already takes 0{k) steps, since merely reading 
y to check for the zero locations of y(^) takes that long. 

2) The t*'' decoding iteration accepts as its input the t^^ signal estimate vector x(t), the t*'' neighbourly set 'D(t), 
and the t*'* residual measurement identification/verification vectors {y^^\t),y^^\t)). In 0(1) steps it outputs 
the {t + 1)*'' signal estimate vector x(t + 1), the {t + if^ neighbourly set P(t + 1), and the (t + 1)*^ residual 
measurement identification/verification vectors (y(^)(t + 1), y(^)(t + 1)) after the performing the following 
steps sequentially (each of which takes at most a constant number of atomic steps): 

3) Pick a random i{t): The decoder picks an element i{t) uniformly at random from the t*^ neighborly set P*^*). 

4) Compute angles 6^^\t) and 6^'^\t): Let the current identification and verification angles be defined respec- 
tively as the phases of the residual identification and verification entries being considered in that step, as 
follows: 

Here Z(.) computes the phase of a complex number (up to ©(maxjlogn/fc, log(A;))}) bits of precision|^ 

5) Check if the current identification and verification angles correspond to a valid and unique Xj: For this, we 
check at most two things (both calculations are done up to the precision specified in the previous step). 

a) First, we check if j{t) = 6^^\t)(2n/-K) is an integer, and the corresponding j*^ element of the i*^ row is 
non-zero. If so, we have "tentatively identified" that the i*^ component of y is a leaf-node of the currently 
unidentified non-zero components of x, and in particular is connected to the j{tf^ node on the left, and 
the algorithm proceeds to the next step below. If not, we simply increment t by 1 and return to Step Q. 

b) Next, we verify our estimate from the previous step. If '^^^■(^j j(()yi(^t)(i)/fii[t) = y^i(t)^^)^ "-^^ verification 
test passes, and the algorithm proceeds to the next step below. If not, we simply increment t by 1 and return 
to Step Q. 

6) Update x(t + 1), V{t + 1), y^^\t + 1), and y^^\t + 1).- In particular, at most 3 components of each of these 
vectors need to be updated. Specifically, Xj(t)(i + 1) equals vfij-^it) / ji^^y The (at most three) neighbours 
of are removed from the neighbourly set to get the neighbourly set 'D{t + 1). And finally (three) 
values each of y^^\t + 1) and y^'^\t + 1) are updated from those of y^^\t) and y^^\t) (those corresponding 
to the neighbours of Xj(^t)i't)) by subtracting out Xj(t){t) multiplied by the appropriate coefficients of A. 

7) Termination: The algorithm stops when the neighbourly set is empty, and outputs the last x(t). 

"Roughly, the former term guarantees that the identification angle is calculated precisely enough, and the latter that the verification angle 
is calculated precisely enough. 
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Figure 4. Initialization : The (true) x equals (0, 1,0, 1,0) (and hence 5(x) = {2,4}). Also note that nodes 1 and 3 on the right of Q 
are <S(x)-leaf nodes, as defined in Property |4] However, all of this is unknown to the decoder a priori. The decoder sets the (starting) 
estimate x(0) of the reconstruction vector x to the all-zeros vector. The (starting) gap vector y is set to equal y, which in turn equals the 
corresponding 4 pairs of identification and verification observations on the right-hand side of Q. The specific values of 0^ j ' in the verification 
observations do not matter currently - all that matters is that given x, each of the four verification observations are non-zero (with high 
probability over the choices of ^j-^'). Hence the (starting) value of the neighbourly set equals {1,2,3,4}. This step takes 0{k) number of 
steps, just to initialize the neighbourly set. By the end of the decoding algorithm (if it runs successfully), the tables will be turned - all the 
entries on the right of Q will equal zero, and (at most) k entries on the left of Q will be non-zero. 
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Figure 5. Failed identification: In this, the first iteration, the decoder randomly picks the index i = 2 from the neighbourly set {1, 2, 3, 4}, 
and checks the phase of the corresponding gap vector identification observation yj^'. Since this equals tt/4, which is not in the set of possible 
phases in the 2"*^ identification row of A (which are all multiples of 7r/6), the decoder declares a failure in this iteration. In particular, the 
decoder is unable to (currently) use j/j^' to identify a non-zero location of x, since the second node on the right of Q is not a 5(x)-leaf 
node. So far, the reconstruction vector x, the gap vector y, and the neighbourly set {1, 2, 3, 4} are all still unchanged. This entire iteration 
takes a constant number of steps. 



Xi = .ri(2)=0 

X2 = 1 i'2(2) = 

■X3 = x^{2) = \/3X 

Xi = l ■Vi{2) = 

^. = ^5(2) = 




Figure 6. Passed identification, failed verification: In the second iteration, a potentially more serious failure could happen. In particular, 
suppose the decoder randomly picks the index i = 4 from the neighbourly set {1,2,3,4} (note that 4 is also not a 5(x)-leaf node), and 
checks the phase of the corresponding gap vector identification observation yil , it just so happens that the value of x is such that this 
corresponds to a phase of tv/G. But as can be seen from the matrix in Figure 3 for i — 4 this corresponds to a\'^ for j = 3. Hence the 
decoder would make a "false identification" of j — 3, and estimate that £3 equals the magnitude of y^/\ which would equal This is 
where the verification entries and verification observations save the day. Recall that each verification entry is chosen uniformly at random 
(with sufficient bit precision) from [0, 27r), independently of both x and the other entries of A. Hence the probability that \/3 (the misdirected 
value of £3) times the corresponding verification entry a^^^ equals 7/4^' is "small". Hence the decoder in this case too declares the iteration 
to be a failure, and leaves the reconstruction vector x, the gap vector y, and the neighbourly set {1, 2, 3, 4} all still unchanged. This entire 
iteration takes a constant number of steps. 
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Figure 7. Passed identification, passed verification: Now, in the third iteration, suppose the decoder randomly picks the index i = 1 from 
the neighbourly set {1,2,3,4} (note that 1 is a 5(x)-leaf node). In this case, the phase of the corresponding gap vector identification 

observation y^^' equals 7r/3. As can be seen from the matrix in Figure 3 for z = 1 this coiTesponds to a\'^ for j = 4. Hence the decoder 

— (I) 
makes a "correct identification" of j = 4, and estimates (also correctly) that X4 equals the magnitude of , which equals 1. On checking 

with the verification entry, the decoder observes also that 1 (the detected value of X4) times the corresponding verification entry a^^^ equals 

y^^ \ Hence it updates the value of £4 to 1, the neighbourly set to {2, 3, 4}, and y to the values shown (only the three indices 1, 3 and 4 

on the right need to be changed). At this point, note that 5'(x) also changes from {2, 4} to the singleton set {4}. This entire iteration takes 

a constant number of steps. 



Xl=Q fi(;l)=0 

X2 = 1 X2{4) = 3/ 

X.i = l 2M(4)=i 

■To = i5(4) = 




(4). l/f ^(4)) 



X-2 = i x-AA) = 1/ 

^^'3 = £3(4) = GKt 
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Figure 8. Termination : In the fourth iteration, the decoder randomly picks i = 4 from the neighbourly set {2,3,4}. Recall that in the 
second iteration this choice of i did not aid in decoding. However, now that node 4 on the right of Q has been "cleaned up", it is now a leaf 
node for 5'(x). This demonstrates the importance of not "throwing away" information which seems useless at some point in time. Hence, 
analogously to the process in Figure |7] the decoder estimates the value of X2 to 1, updates the neighbourly set to the empty set, and y to 
the all-zero vector (all in a constant number of steps). Since the gap vector is zero, this indicates to the decoder that it should output x as 
its estimate of x, and terminate. 
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F. Expected number of iterations 

We first argue that, with a constant probabiUty, each iteration result in recovering a new non-zero coordinate 
from X. Towards this, for each t = 1,2, . . ., let S{t) be the support of x — x(i). Note that 'D(t) = N{S{t)) and 
5(x) =5(1) D5(2) D .... 

Then, according to Lemma [3] and the way we generate the measurement matrix A, with a high probability, for 
each t, the probability that there exists a node i{t) in Y^^\t) so that it is an 5(t)-leaf node is lower bounded 
by 1/2. Consequently, exactly one non-zero coordinate in S{t) completely determines yl^^j-^it) and fl^f^it)- The 
algorithm identifies this coordinate as j for the t*'' iteration and at the end of iteration, recovers xj. Thus, whenever 
i{t) is an S{t)-\eaf node, the set of recovered coordinates increases by 1. When i{t) is not an 5(t)-leaf node, 
our reconstruction process wastes one iteration and will start another iteration by picking another node from the 
neighborly set uniformly at random. Hence, the operations among different iterations are independent, and 
each iteration succeeds with probability 1/2. 

Since there are at most k non-zero coordinates in x, the number of iterations before the algorithm terminates 
follows a Pascal distribution with parameters {k, 1/2). The expected number of iterations is then simply 2k. 



G. Correctness 

Next, we show that x = x with a high probability. To show this, it suffices to show that each non-zero update 
to the estimate x(t) sets a previously untouched coordinate to the correct value with a high probability. 

Note that if i{t) is a leaf node for S{t), and if all non-zero coordinates of x(t) are equal to the corresponding 
coordinates in x, then the decoder correctly identifies the parent node j{t) G S{t) for the leaf node i{t) as the 
unique coordinate that passes the phase identification and verification checks. 

Thus, the t*'' iteration ends with an erroneous update only if 



peiV({i(t)}) 



for some j such that there are more than one non-zero terms in the summation on the left. 



Z( x,e<)'^)-^(^) 



pe7V({i(t)}) 

Since V{i{t),j) is drawn uniformly at random from {1, 2, ... , [4n]}, the probability that the second equality holds 
with more than one non-zero term in the summation on the left is at most l/(4n). The above analysis gives an 
upper bound on the probability of incorrect update for a single iteration to be l/(4n). Finally, as the total number 
of updates is at most k, by applying a union bound over the updates, the probability of incorrect decoding is upper 
bounded by k/4n. Since k = o{n) by assumption, it follows that the error probability vanishes as n and k grow 
without bound. 



H. Remarks on the Reconstruction process for exactly k-sparse signals 

We elaborate on these choices of entries of A in the remarks below, which also give intuition about the 



reconstruction process outlined in Section |II-E 



Remark 1: In fact, it is not critical that ([T]) be used to assign the identification entries. As long as j can be "quickly" 
(computationally efficiently) identified from the phases of a-^^ (as outlined in Remark 2 below, and specified in more 
detail in Section [iPE I, this suffices for our purpose. This is the primary reason we call these entries identification 
entries. 

Remark 2: The reason for the choice of phases specified in (IT]) is as follows. Suppose 5(x) corresponds to the 

— (I) 
support (set of non-zero values) of x. Suppose in corresponds to a 5(x)-leaf node, then by definition yl equals 

CL'p-Xj for some j in {1, . . . , n} (if 7/j corresponds to a 5 (x) -non-leaf node, then in general y\^^ depends on two or 
more Xj). But Xj is a real number. Hence examining the phase of yi enables one to efficiently compute j7r/(2n), 
and hence j. It also allows one to recover the magnitude of Xj, simply by computing the magnitude of yi. 
Remark 3: The choice of phases specified in ([T]) divides the set of allowed phases (the interval [0,7r/2]) into n 
distinct values. Two things are worth noting about this choice. 
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1) We consider the interval [0, 7r/2] rather than the full range [0, 2tt) of possible phases since we wish to use the 
phase measurements to also recover the sign of XjS. If the phase of yi falls within the interval [0,7r/2], then 
(still assuming that yi corresponds to a 5 (x) -leaf node) xj must have been positive. On the other hand, if the 
phase of y^ falls within the interval [7r,37r/2], then Xj must have been negative. (It can be directly verified 
that the phase of a 5 (x) -leaf node yi can never be outside these two intervals - this wastes roughly half of 
the set of possible phases we could have used for identification purposes, but it makes notation easier. 

2) The choice in ([T]) divides the interval [0, 7r/2] into n distinct values. However, in expectation over Q the actual 
number of non-zero entries in a row of A is 0{n/k), so on average one only needs to choose 0{n/k) distinct 
phases in ([T]), rather than the worst case n number of values. This has the advantage that one only needs 
0(log(n/A;)) bits of precision to specify distinct phase values (and in fact we claim that this is the level of 
precision required by our algorithm). However, since we analyze only left-regular Q, the degrees of nodes on 
the right will in general vary stochastically around this expected value. If k is "somewhat large" (for instance 
k = 0(n)), then the degrees will not be very tightly concentrated around their mean. One way around this is to 
choose Q uniformly at random from the set of bipartite graphs with n nodes (each of degree d) on the left and 
m nodes (each of degree dn/m) on the right. This would require a more intricate proof of the 5' (x) -expansion 
property defined in Property [3] and proved in Lemma [T] For the sake of brevity, we omit this proof here. 

Remark 4: In fact, the recent work of p2| demonstrates an alternative analytical technique (bypassing the expansion 
arguments outlined in this work), involving analysis of properties of the "2-core" of random hyper-graphs, that allows 
for a tight characterization of the number of measurements required by SHO-FA to reconstruct x from y and A, 
rather than the somewhat loose (though order-optimal) bounds presented in this work. Since our focus in this 
work is a simple proof of order-optimality (rather than the somewhat more intricate analysis required for the tight 
characterization) we again omit this proof hereP 



/. Database query 

A useful property of our construction of the matrix A is that any desired signal component Xj can be reconstructed 
with a constant probability given the measurement vector y = ^x in a constant time. The following Lemma makes 
this precise. The proof follows from a simple probabilistic argument and is included in Appendix [Pj 

Lemma 4. Let x be k-sparse. Let j G {1,2, ... ,n} and let A G C'^'^^" be randomly drawn according to SHO-FA. 
Then, there exists an algorithm A such that given inputs (j, y), A produces an output xj with probability at least 
(1 — (d/c)'^) such that xj = xj with probability (1 — o{l/k)). 

J. SHO-FA for sparse vectors in different bases 

In the setting of SHO-FA we consider A;-sparse input vectors x. In fact, we also can deal with the case that x is 
sparse in a certain basis that is known a priori to the decoder say which means that x = "^w where w is a 
/j-sparse vector. Specifically, in this case we write the measurement vector as y = Sx, where B = ^4^'^^. Then, 
y = 74\I'~^^'w = A-w, where A is chosen on the structure of the Q and w is a /c-sparse vector. We can then apply 
SHO-FA to reconstruct w and consequently x = ^w. What has been discussed here covers the case where x is 
sparse itself, for which we can simply take ^' = / and x = w. 

K. Information-theoretically optimal number of bits 

We recall that the reconstruction goal for SHO-FA is to reconstruct x up to relative error 2~^. That is, 

||x-i||i/||x||i < 2-P. 

We first present a sketch of an information-theoretic lower bound of Q{k{P -\- log n)) bits holds for any algorithm 
that outputs a fc-sparse vector that achieves this goal with high probability. 

To see this is true, consider the case where the locations of k non-zero entries in x are chosen uniformly at 
random among all the n, entries and the value of each non-zero entry is chosen uniformly at random from the set 
{1, . . . ,2^}. Then recovering even the support requires at least log (2'^^(^))) bits, which is log(n/fc))p'| 

'*We thank the anonymous reviewers who examined a previous version of this work for pointing out the extremely relevant techniques 
of |32[ and |34[ (though the problems considered in those works were somewhat different). 

For example, "smooth" signals are sparse in the Fourier basis and "piecewise smooth" signals are sparse in wavelet bases. 
^"Stirling's approximation(c.f. j47l Chapter 1]) is used in bounding from below the combinatorial term (^). 
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Figure 9. An example of a physical system that "naturally" generates ensembles of sparse A that SHO-FA can use: Suppose there are k 
cellphones (out of a set of n possible different cellphones in the whole world) in a certain neighbourhood that has a base-station. The goal 
is for the j-th cellphone to communicate its information {Xj) to the base-station at least once per/rame of ck consecutive time-slots. The 
challenge is to do so in a distributed manner, since multiple cellphones transmitting at the same time i would result in a linear combination 
j/i = f^ij^j of their transmissions reaching the base-station, where aij corresponds to the channel gain from the cellphone j to the 
base-station during time-slot i. Each cellphone transmits Xj to the base-station a constant (d) number of times in each frame - the set of d 
time-slots in each frame that cellphone j transmits in is chosen by j uniformly at random from the set of all (^^^ sets of slots. 

Also, at least a constant fraction of the k non-zero entries of x must be be correctly estimated to guarantee the 
desired relative error. Hence U,{k{P + logn)) is a lower bound on the measurement bit-complexity. 

The following arguments show that the total number of bits used in our algorithm is information-theoretically 
order-optimal for any k = 0{n^~^) (for any A > 0). First, to represent each non-zero entry of x, we need to 
use arithmetic of Q{P + log(A;)) bit precision. Here the P term is so as to attain the required relative error of 
reconstruction, and the log(A;) term is to take into account the error induced by finite-precision arithmetic (say, for 
instance, by floating point numbers) in 0{k) iterations (each involving a constant number of finite -precision additions 
and unit-magnitidue multiplications). Second, for each identification step, we need to use Q.{\og{n) + \og{k)) bit- 
precision arithmetic. Here the log(n) term is so that the identification measurements can uniquely specify the 
locations of non-zero entries of x. The log(A;) term is again to take into account the error induced in 0{k) 
iterations. Third, for each verification step, the number of bits we use are 31og(A;). Here, by the Schwartz-Zippel 
Lemma | [48) , | [49| , 2 log(A;) bit-precision arithmetic guarantees that each verification step is valid with probability 
at least 1 — Xjkr - a union bound over all Oik) verification steps guarantees that all verification steps are con^ect 
with probability at least 1 — 0(\lk\ Therefore, the total number of bits needed by SHO-FA C'(A:(log(n) + -P)). 
As claimed, this matches, up to a constant factor, the lower bound sketched above. 

L. Universality 

While the ensemble of matrices {^} we present above has carefully chosen identification entries, and all the 
non-zero verification entries have unit magnitude, we argue that in fact the implicit ideas underlying SHO-FA 
work for significantly more general ensembles of matrices. In particular. Property [T] only requires that the graph 
Q underling A be "sparse", with a constant number of non-zero entries per column. Property [T] only requires that 
each non-zero entry in each row be distinct - which is guaranteed with high probability, for instance, if each entry 
is chosen i.i.d from any distribution with sufficiently large support. An example of such a scenario is shown in 
Figure |9] This naturally motivates the application of SHO-FA to a variety of scenarios, for e.g., neighbor discovery 
in wireless communication ||38l. 
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III. Approximate reconstruction in the presence of noise 

A prominent aspect of the design presented in the previous section is that it relies on exact determination of all 
the phases as well as magnitudes of the measurement vector ylx. In practice, however, we often desire that the 
measurement and reconstruction be robust to corruption both before and and during measurements. In this section, 
we show that our design may be modified slightly such that with a suitable decoding procedure, the reconstruction 
is robust to such "noise". 

We consider the following setup. Let x G R" be a /c-sparse signal with support 5(x) = {j : Xj / 0}. Let z G R" 
have support {1, 2, . . . , n} \ 5(x) with each zj distributed according to a Gaussian distribution with mean and 
variance a^. Denote the measurement matrix by A G c™x" and the measurement vector by y G C™. Let e G 
be the measurement noise with distributed as a Complex Gaussian with mean and variance dg along each axis, 
y is related to the signal as 

y = ^(x + z) + e. 

We propose a design procedure for A satisfying the following properties. 

Theorem 2. Let k = 0{n^-'^) for some A > 0. There exists a reconstruction algorithm SHO-FA for A G C™^" 
such that 
(i) m = ck 

(ii) SHO-FA consists of at most Ak iterations, each involving a constant number of arithmetic operations with a 

precision of O (log n) bits. 
(Hi) With probability 1 — o{l/k) over the design of A and randomness in e and z, 

||x-x||i <C(||z||i + (logA;)2||e||i) 

for some C = C{az, fXe) > 0. 

Recall that in the exactly A;-sparse case, the decoding in t-th iteration relies on first finding an 5(t)-leaf node, 
then decoding the corresponding signal coordinate and updating the undecoded measurements. In this procedure, it 
is critical that each iteration operates with low reconstruction errors as an error in an earlier iteration can propagate 
and cause potentially catastrophic errors. In general, one of the following events may result in any iteration ending 
with a decoded signal value that is far from the true signal value: 

(a) The decoder picks an index outside the set {i : (Ax)j / 0}, but in the set {i : (A(x + z) + e), / 0}}. 

(b) The decoder picks an index within the set {i : {Ax.)i / 0} that is also a leaf for S with parent node j, 
but the presence of noise results in the decoder identifying (and verifying) a node j' ^ j as the parent and, 
subsequently, incorrectly decoding the signal at /. 

(c) The decoder picks an index within the set {i : {Ax.)i ^ 0} that is not a leaf for S, but the presence of noise 
results in the decoder identifying (and verifying) a node j as the parent and, subsequently, incorrectly decoding 
the signal at f 

(d) The decoder picks an index within the set {z : (Ax)j ^ 0} that is a leaf for S with parent node j, which it 
also identifies (and verifies) correctly, but the presence of noise introduces a small error in decoding the signal 
value. This error may also propagate to the next iteration and act as "noise" for the next iteration. 

To overcome these hurdles, our design takes the noise statistics into account to ensure that each iteration is resilient 
to noise with a high probability. This achieved through several new ideas that are presented in the following for ease 
of exposition. Next, we perform a careful analysis of the corresponding decoding algorithm and show that under 
certain regularity conditions, the overall failure probability can be made arbitrarily small to output a reconstruction 
that is robust to noise. Key to this analysis is bounding the effect of propagation of estimation error as the decoder 
steps through the iterations. |^ 

^'por simplicity, tlie analysis presented here relies only on an upper bound on the length of the path through which the estimation error 
introduced in any iteration can propagate. This bound follows from known results on size of largest components in sparse hypergraphs |50j. 
We note, however, that a tighter analysis that relies on a finer characterization of the interaction between the size of these components and 
the contribution to total estimation error may lead to better bounds on the overall estimation error. Indeed, as shown in |34|, such an analysis 
enables us to achieve a tighter reconstruction guarantee of the form ||x — x||i=0(||z||i + ||e||i) 
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A. Key ideas 

1) Truncated reconstruction: We observe that in the presence of noise, it is unUkely that signal values whose 
magnitudes are comparable to that of the noise values can be successfully recovered. Thus, it is futile for the 
decoder to try to reconstruct these values as long as the overall penalty in /i-norm is not high. The following 
argument shows that this is indeed the case. Let 

Ss{^) = {j :\xj\<S/k}. (2) 

and let x^^ be the vector defined as 

_ro j^5,(x) 

Similarly, define x^c which has non-zero entries only within the set 5(x) \ ^^(x). The following sequence of 
inequalities shows that the total h norm of X5j is small: 

ie55(x) 
< |5.(x)|^ 

= S. (3) 
Further, as an application of triangle inequality and the bound in ([3]), it follows that 

||x-x||i = ||X - X5| - X5JI1 

< l|x-X5c||i + llx^Jli 

< ||x-X5|||i + (5 (4) 

Keeping the above in mind, we rephrase our reconstruction objective to satisfy the following criterion with a 
high probability: 

||x-X5|||i < Ci(||z||i + A;2||e||i), (5) 
while simultaneously ensuring that our choice of parameter 6 satisfies 

<5<C2||z||i (6) 

for some C2, with a high probability. 

2) Phase quantization: In the noisy setting, even when i is a leaf node for 5(x), the phase of yi may differ 
from the phase assigned by the measurement. This is geometrically shown in Figure 11a for a measurement matrix 
A'. To overcome this, we modify our decoding algorithm to work with "quantized" phases, rather than the actual 
received phases. The idea behind this is that if i is a leaf node for 5(x), then quantizing the phase to one of 
the values allowed by the measurement identifies the correct phase with a high probabiUty. The following lemma 
facilitates this simplification. 

Lemma 5 (Almost bounded phase noise). Let x, z G R" with \xj\ > 6/k for each j. Let A' G C" ^" be a complex 
valued measurement matrix with the underlying graph Q. Let i be a leaf node for 5(x). Let A9i = \Zyi — Z{A'x)i\. 
Then, for every a > 0, 



(52 



and 



Pr (A9i > aE,^,{A9i)) < \e-^^'l-\ 



Proof: See Appendix [E] ■ 
For a desired error probability e', the above lemma stipulates that it suffices to let a = (1/2) log(l/2e'). We 
examine the effect of phase noise in more detail in Appendix [F] 
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Figure 10. The black curve corresponds to the magnitudes of x + z (for ease of visual presentation, the components of x have been sorted 
in decreasing order of magnitude and placed in the first k components of the signal, but the components of z are unsorted. The blue curve 
corresponds to our reconstruction x of x. Note that we only attempt to reconstruct components of x that are "sufficiently large" (that is, 
we make no guarantees about correct reconstruction of components of x in Ssi'x.), , i.e, those components of x that are smaller than some 
"threshold" S/k. Here 5 is a parameter of code-design to be specified later. As shown in Section Ill-Al as long as S is not "too large", this 
relaxation does not violate our relaxed reconstruction criteria 




(a) Maximum phase displacement occurs (b) Maximum magnitude displacement 

when the contribution due to noise, i.e., takes place when the contribution due to 

{Az)i + Bi is orthogonal to the measure- noise is aligned with (Ax)i 
ment y; 

Figure 11. The effect of noise on a measurement output 



3 ) Repeated measurements: Our algorithm works by performing a series of F > 1 identification and verification 
measurements in each iteration instead of a single measurement of each type as done in the exactly /c-sparse case. 
The idea behind this is that, in the presence of noise, even though a single set of identification and verification 
measurements cannot exactly identify the coordinate j from the observed yi, it helps us narrows down the set of 
coordinates j that can possibly contribute to give the observed phase. Performing measurements repeatedly, each 
time with a different measurement matrix, helps us identify a single j with a high probability. 

We implement the above idea by first mapping each j E {1, 2, . . . , n} to its F-digit representation in base G = 
{0,1,... [ni/^-l]}. For each j G {1, 2, . . . , n}, let ^(j) = (s'lO'), 52(j)> • • • > 5r(i)) be the F-digit representation of 
j. Next, perform one pair of identification and verification measurements (and corresponding phase reconstructions), 
each of which is intended to distinguish exactly one of the digits. In our construction, we only need a constant 
number of such phase measurements per iteration. See Fig 12 for an illustrating example. 



19 




(a) The decoder "randomly" picks yi. 
Since the phase of t/j^'^' is between — 7r/4 
and 7r/4, the decoder can distinguish that 
the first bit of non-zero location is since 
the decoder can tolerate at most 7r/4 phase 
displacement for j/i. So, the non-zero entry 
is one of xi, X2, x^, X4. 



Is -I {11,1.0) 

.C4^fO,l.ll 

l7-> (1,1,0) 

is-»fl,l,l) 



(b) The decoder "randomly" picks j/i 
again. Since the phase of t/j^'^' is be- 
tween 3n/4 and 57r/4, the decoder can 
distinguish that the second bit of non-zero 
location is 1 since the decoder can tolerate 
at most 7r/4 phase displacement for yi. So, 
the non-zero entry is one of 3:3, X4,, x-j, 
xs- Combing the output in the first phase 
measurement, we conclude that the non- 
zero entry is one of x^ and X4,. 




(c) The decoder "randomly" picks j/i 
again. Since the phase of j/j^'^' is be- 
tween — 7r/4 and 7r/4, the decoder can 
distinguish that the third bit of non-zero 
location is since the decoder can tolerate 
at most 7r/4 phase displacement for yi . So, 
the non-zero entry is one of x\, x^, X5, 
xr. Combing the outputs in the first and 
second phase measurement, we conclude 
that the non-zero entry is X3. 



Figure 12. If we were to distinguish each j from 1 to 8 by a different phase, the decoder can tolerate at most 7r/14 phase displacement 
for any output yi. Instead, we first represent each j = 1, 2, . . . , 8 by a three-length binary vector. Next, we perform three sets of phase 
assignments - one for each digit. It is easily seen that by allowing multiple measurements, the noise tolerance for the decoder increases. 



B. Description of measurements 

As in the exactly fc-sparse case, we start with a randomly drawn left regular bipartite graph Q with n nodes on 
the left and m' nodes on the right. 

Measurement matrix : The measurement matrix A G c2mTxn chosen based on the graph Q. The rows of A 
are partitioned into m! groups, with each group consisting of 2V consecutive rows. The j-th entries of the rows 
2{i - l)r + 1, (i - l)r + 2, . . . , 2iT are denoted by a!lf\a^lf\ 4f\af;^'^\a^^'^\ respectively. In 

the above notation, I and V are used to refer to identification and verification measurements. 

For ease of notation, for each 7 = 1, 2, . . . , T, we use (resp. A^'^'"'^) to denote the sub-matrix of A whose 

(i,j)-th entry is a^j'^'' (resp. alj''^^)- 

We define the 7-th identification matrix A^^''^^ as follows. For each if the graph Q does not have an edge 

connecting i on the right to j on the left, then a\^j"'^ = 0. Otherwise, we set «■ j'^'' to be the unit-norm complex 
number 

afy^) = e'9.(i)V2(|G|-i|)_ 

Remark: Note here that the construction for the exactly /c-sparse case can be recovered by setting F = 1, which 
results in (G = {1, 2, . . . , n} and g^yij) = j. 

Next, we define the 7-th verification matrix ^4^^''^^ in a way similar to how we defined the verification entries 
in the exactly A;-sparse case. For each if the graph Q does not have an edge connecting i on the right to j 

on the left, then a,-^'^'' = 0. Otherwise, we set 

where e'JJ'^^ is drawn uniformly at random from {0, 7r/2(|(G| - 1), 7r/(|G| - 1), 3tt/2{\G\ - 1) . . . , 7r/2}. 

Given an signal vector x, signal noise z, and measurement noise e, the measurement operation produces a 
measurement vector y = A(x + e). Since A can be partitioned into T identification and T verification rows, we 
think of the measurement vector y as a collection of outcomes from T successive measurement operations such 
that 

y(/,7) =^a,7)(x + z) + e(^'^) 

and 

y(V,7)=^(V,7)(x + ^)+e(V,7) 

are the outcomes from the 7-th measurement and y = ((y^^'^\ y*^^'^^) : 1 < 7 < F). 
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C. Reconstruction for approximately k-sparse signals with noisy measurements 

The decoding algorithm for this case extends the decoding algorithm presented earlier for the exactly fc-sparse 



case by including the ideas presented in Section III-A The total number of iterations for our algorithm are upper 
bounded by Ak. 

1) We initialize by setting the signal estimate vector x(l) to the all-zeros vector 0", and for each 7 = 1, 2, . . . , F, 
we set the residual measurement identification/verification vectors y'-^''^^(l) and y*^^''^)(l) to the decoder's 
observations y'-^''''^ and y^^''^-'. 

Let T>{1), the initial neighborly set, be the set of indices i for which, at which the magnitude corresponding 
to all verification and identification vectors is greater than 5/k, i.e., 



P(l)=n|^:|y(^'^)|> 



> 



vector y^^-*, i.e., the set {i < m : y^^\l) / 0}. This step takes 0{k) steps, since merely reading y to check 
for the zero locations of y^^^ takes that long. 

2) The t^^ decoding iteration accepts as its input the t^^ signal estimate vector x^*), the i*'^ neighbourly set ^{t), 
and the t^^ residual measurement identification/verification vectors {{y^^'"'\t),'y^'^''^\t)) : 7 = 1, 2, . . . , P). In 
0(1) steps it outputs the (t + l)*^ signal estimate vector x(*+^\ the {t + lY^ neighbourly set P(t + 1), and the 

residual measurement identification/verification vectors ((y*^^''^''(t + 1), y*^^''''^(t + 1)) : 7 = 1, 2, . . . , F) 
after the performing the following steps sequentially (each of which takes at most a constant number of atomic 
steps). 

3) Pick a random i{t): The decoder picks i{t) uniformly at random from 

4) Compute quantized phases: For each 7 = 1, 2, . . . , F, compute the current identification angles, 9 



and 



current identification angles, ^^^'^^ defined as follows 



3(^,7) 



2(|G|-1|) 



z4';)(mod vr) 



vr 



TT 



2(|G| 



2(|G|-1|) 



^yJJrVod vr; 



vr 



vr 



2(|G|-1|) 



In the above, [•] denotes the closest integer function. Since there are G(n) different phase vectors, to perform 
this computation, Oilogn) precision and 0(1) steps suffice. 

For each 7 = 1, 2, . . . , F, let 5^*^ = 2(|G| — l|)^(^'''')/vr be the current estimate of j-th digit and let j{t) be 
the number whose representation in G is {gf\g2 \ ■ ■ ■ )5r^)- 
5) Check if the current identification and verification angles correspond to a valid and unique j: This step 
determines if i{t) is a leaf node for ^^(x — x(t)). This operation is similar to the corresponding exact- A; 
case. The main difference here is that we perform the verification operation on each of the F measurements 
separately and declare i{t) as a leaf node only if it passes all the verification tests. The verification step for 
the 7-th measurement is given by the test: 



,(V,7) 



If the above test succeeds for every 7 = 1, 2, . . . , F, we set Ax{t) to \ylu)\t)\ if '^ViU)' ^ 3vr/4], and 



,(/,7) 



y^l!?\t)\ if Zy^'?'' G (3vr/4, 7vr/4] Otherwise, we set Ax{t) = 0. This step requires at most F verification 



,(/,7) 



i{t) 



H{t) 



Hit) 



steps and therefore, can be completed in 0(1) steps. 
6) Update x(t + 1), y(t + 1), and V{t + I): If the verification tests in the previous steps failed, there are no 
updates to be done, i.e., set x(i + 1) = x(t), y(i + 1) = y(t), and P(t + 1) = 

Otherwise, we first update the current signal estimate to x(t + 1) by setting the j{t)-th coordinate to Ax{t). 
Next, let ii, i2, «3 be the possible neighbours of j{t). We compute the residual identification/verification vectors 
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y{t + 1) at ii, Z2, h by subtracting the weight due to Ax{t) at each of them. Finally, we update the neighbourly 
set by removing ii,i2, and is from T>{t) to obtain T>{t + 1). 
The decoding algorithm terminates after the T-th iteration, where T = min{4A;, {t : V{t + 1) = 0}}. 

IV. Conclusion 

In this work we present SHO-FA - an algorithm for sparse recovery that requires an information-theoretically 
order-optimal number of measurements, bits over all measurements, and decoding time-complexity, and as a bonus, 
with non-zero probability it can handle "data-base queries". The algorithm is robust to noisy signal tails and noisy 
measurements. The algorithm is "practical" (all constant factors involved are "small"), as validated by both our 
analysis, and simulations. Our algorithm can reconstruct signals that are sparse in any basis that is known a priori 
to both the encoder and decoder, and works for a large ensemble of "sparse" measurement matrices A. 

As future work we aim to use recent work by |32| and [34j to improve respectively the constant factors in our 
performance analysis and reconstruction error. 
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Appendix 

A. Proof of Lemma [7] 

Proof: It suffices to prove the desired property for all 5(x) of size exactly k. Let 5'(x) C 5(x). Let 
{(si, fi), (s2, ^2), • • • , (sd|5'(x)|) *(i|5'(x)|)} be the set of outgoing edges from 5'(x). Without loss of generality, 
we assume these edges are drawn in the following manner. 

In the initialization stage, we "split" every node on the right of Q to dn/ck "virtual" node^ Each virtual node 
represents a "true" node on the right. We maintain a set of "remaining" virtual nodes, which we will select and 
remove virtual nodes from. 

To draw the edges, we visit the nodes in 5'(x) (on the left of Q) sequently. For each node, we select uniformly 
at random a set of d distinct virtual nodes from the remaining virtual node set. We form d edges by connecting 
this node in 5'(x) and the true nodes on the right that those d selected virtual nodes represent. After the d edges 
are formed, we remove the d selected virtual nodes from the remaining virtual node set, and proceed to the next 
node in 5'(x). 

In this way, we generate a bipartite graph that is both d left-regular and dn/ck right-regular; that is, each node 
on the left has a degree of d and each node on the right has a degree of dn/ck. By using standard arguments of 

^^We assume dn/ck is an integer, with the understanding that in practice one can always increase c to make dn/ck integer while the 
"fail-to-expand" probability is still bounded by the desired target e. 
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sequential implementation of random experiments, one can verify that the graph generated in this way is chosen 
uniformly at random from all bipartite graphs that are both d left-regular and dn/ck right-regular 

For each i = 1, 2, . . . , d|5'(x)|, the probability that the edge [si, U) reaches an "old" true node (on the right) 
that is already reached by those edges generated ahead of {si,ti) is upper bounded as 

(z-l) 



Priue {h,...ti^i}) < 
y 



< 



ck 



Let N{S'{x.)) be the set of all neighboring nodes of the nodes in 5'(x). The size of iV(5'(x)) is no more than 
2(i|5'(x)|/3 if and only if out of (i|5'(x)| edges, there exists a set of at least (i|5'(x)|/3 edges fail to reach "new" 
nodes (on the right). Exploiting this observation, we have 
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Consequently, the probability that there exists one >S'(x) C 5(x) so that |A'^(iS'(x))| < 2d|5'(x)|/3 can be bounded 
by 
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In the above, the inequaUty in (|7]l follows from Stirling's approximation; the upper bound in ([8]) is derived by 
noting that the first term in the sum takes its maximum when j = [Vk] and the second term is maximum when 
j = k; ^ is obtained by noting that the second term is a geometric progression. 

Finally, we plug in the choice of d = 7 to complete the proof. ■ 

B. Proof of Lemma |2] 

Suppose each set of of size k of 5(x) nodes on the left of Q has strictly more than d/2 times as many nodes 
neighbouring those in 5(x), as there are in 5(x). Then by standard arguments in the construction of expander 
codes | |33J , this implies the existence of a linear code of rate at least 1 — m/n, and with relative minimum distance 
at least fc/nj^But by the Hamming bound |44|, it is known that codes of minimum distance 6 can have rate at 
most 1 — H{6), where H{.) denotes the binary entropy function. Since k = (n), 6 = k/n — )• 0. But in this regime 
1 — H{5) — )• 1 — 5\og{l/5). Comparing {k/n)\og{n/k) with m/n gives the required result. ■ 

C. Proof of Lemma |3] 

For any set of nodes S in the graph Q, we define N{S) as the set of neighboring nodes of the nodes in S. For 
any set 5'(x) C 5(x), we define /3 as the portion of the nodes in A^(5'(x)) that are 5' (x) -leaf nodes. 
First, each node v € N{S'{'s.)) is of one of the following two types: 

1) It has only one neighboring node in 5'(x), on the left of Q. By the definition of /?, the number of nodes in 
iV(cS'(x)) of this type is /3|iV(cS'(x))|. 

2) It has at least two neighboring nodes in 5'(x), on the left of Q. The number of nodes in A^(5'(x)) of this 
type is (l-/3)|iV(5'(x))|. 

We have two observations. First, since the degree of each node in 5'(x) is d, the total number of edges from 
5'(x) to A^(5'(x)) is at most (i|5'(x)| and the number of nodes in N{S'{x.)) is at most d\S'{'x.)\. 
Second, the total number of edges entering N{S'{'k)) from 5'(x) is at least 

/3|iV(5'(x))| + 2(1 - /3)|iV(cS'(x))| = (2 - /3)|iV(5'(x))|, 

as the number of neighboring nodes for the nodes of Type 1 is one and of Type 2 is at least two. 
Combining the above two observations, we can get the following inequality: 

(2-/3)d|iV(5'(x))|/3<d|5'(x)|. 

According to the setting of the Lemma, we also have |A^(5'(x))| > 2(i/|5'(x)|3. Therefore, it follows that 

2(2-/3)d|5'(x)|/3<<i|5'(x)|, 

and consequently /3 > 1/2. ■ 

D. Proof of Lemma^ 

Consider the algorithm A that proceeds as follows. First, among the set of all right nodes that neighbour j, check 
if there exists a node i such that y^^-* = = 0. If there exists such a node, then output Xj = 0. Otherwise, check 
if there exists a 5(x)-leaf node among the neighbours of j. This check can be performed by using verification and 
identification observations as described for the SHO-FA reconstruction algorithm. If there exists a leaf node, say i, 
then output xj = \yi\. Else, the algorithm terminates without producing any output. 

Two see that the above algorithm satisfies the claimed properties, consider the following two cases. 
Case 1: xj = 0. In this case, xj = is output if at least one neighbour of j lies outside A^(5(x)). Since iV(5(x)) 



For the sake of completeness we sketch such an argument here. Given such an expander graph Q, one can construct a n x k binary 
matrix A with Is in precisely those locations where the zth node on the left is connected with the jth node on the right. Treating this 
matrix A as the parity check matrix of a code over a block-length n implies that the rate of the code is at least k/n, since the parity-check 
matrix imposes at most k constraints on the n bits of the codewords. Also, the minimum distance is at least k. Suppose not, i.e. there exists 
a codeword in this linear code of weight less than k. Let the support of this codeword be denoted 5(x). Then by the expansion property 
of Q, there are strictly more than |5(x)|ii/2 neighbours of 5(x). But this implies that there is at least one node, say v, neighboring 5(x) 
which has exactly one neighbor in 5(x). But then the constraint corresponding to v cannot be satisfied, leading to a contradiction. 
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has at most dk elements, the probability that a neighbour of j lies inside A^(5(x)) is at most dk/ck = d/c. Thus, the 
probability that none of the neighbours of j lie outside A^(5(x)) is at least (1 — (d/c)'^). The algorithm incorrectly 
reconstructs Xj if all neighbours of j lie within N{S{x.)) and SHO-FA incoorectly identifies one of these nodes as 
a leaf node. By the analysis of SHO-FA, this event occurs with probability o{l/k). 

Case 2: xj ^ 0. For A to produce the correct output, it has to identify one of the neighbours of j as a leaf. The 
probability that there exists a leaf among the neighbours of j is at least (1 — {d/c)'^) by an argument similar to the 
previous case. Similarly, the proabability of erroneous identification is o(l/A;). ■ 



E. Proof of Lemma |5] 

First, we find an upper bound on the maximum possible phase displacement in yi due to fixed noise vectors 
z and e. Let A0j be the difference in phase between the "noiseless" output (A'x)j and the actual output yi = 
(^'(x + z) + e)j. Figure 11a shows this geometrically. By a straightforward geometric argument, for fixed z and 
e, the phase displacement A6'j is upper bounded by 7r|(A'z)j + ej|/|(^'x)j|. Since i is a leaf node for 5(x), 
|(A'x)i| > \5/k\. Therefore, 

Mi<T:\{A!7,)i + ei\k/5. 



Thus, 



VTiMi>a) < Pi (\{A'z)i + e^\k/5 > a] 

z,e z,e \ / 

= Pi(^\{A'z)i + ei\> a6/irk^ 



Since each Zj is a Gaussian with zero mean and variance o"^, {A'z)i is a Complex Gaussian with zero mean and 
variance at most no"^. Further, each row of A' has at most dn/ck non-zero entries. Therefore, (A'z)j + Ci is a zero 
mean complex Gaussian with variance at most {dn/ck)a1 + Ug. 
The expected value of A0j is bounded as follows: 

^z,e(A0i) 

< Ez,e(^|(A'z), + ei|A:/(5) 




TT[dna^/ck) + 



2'i:k'^{dnal/ck + al) 



5^ 

Finally, applying standard bounds on the tail probabilities of Gaussian random variables, the required probability 
is upper bounded by e^(° /^'^)/2. ■ 



F. Probability of error 

An error occurs only if one of the following take place: 

1) The underlying graph Q is not an 5 (x) -expander. This probability can be made o(l//c) by choosing m = ck, 
where the constant c is determined by Lemma [T] 

2) The phase noise in (t) leads to an incorrect decoding of or O^^'^^ for some 7 and t. 
Note that the phase noise in yi{t-){t) consists: 

a) The contribution due to noise vectors z and e, and 

b) The contribution due to the noise propagated while computing each yi{t){T) from yi(t){'^ ~ 1) for t < t. 
The contribution due to the first term is bounded by Lemma [s] Thus, for a target error probability e', we 
choose a = (1/2) log l/2e', giving a contribution to the phase noise of at most 

log(l/2e') hirk^idnal/ck + a^) 
2 V P ■ 
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To bound the contribution due to the second term, note that at each iteration t, any error in reconstruction of 
Xj{t^ potentially adds to reconstruction error in all future iterations t' for which there is a path from j{t) to 
j{t'). Since the restriction of Q to 5(x) and its neighbours is a sparse graph, it follows from |50] that, with 
probability 1 — o{\/k), it consists only of disjoint components of size 0(log k) (see |34J for such an analysis). 
Thus, the magnitude error in reconstruction of Xj^i-^ due to noisy reconstructions in previous iterations is 

O {{\ogkf\og{l/e'U{n<jl/k + al)) . (11) 



Thus, the phase displacement in each y^-^'^^ and y,-^'^^ is 



Therefore, as long as 

the probability of any single phase being incorrectly detected is upper bounded by e'. Since we there are a 
total of 8Tk possible phase measurements, we choose e' = 0(l/r/c^) to achieve a target error probability 
0{l/k). 

3) The verification step passes for each measurement in the t-th measurement, even though i{t) is not a leaf node 
for Sli^). 

4) V{T) 7^ A', i.e., the algorithm terminates without recovering all xj's. Note that similar to the exact /c-sparse 
case, in each iteration t, by Lemma |3j the probability that i{t) is a leaf node for ^^(x — x(t)) at least 1/2. 
However, due to noise, there is a non-zero probability that even when i{t) is a leaf node, it does not pass 
the verification tests. We know from the analysis for the previous case that this probability is 0{l/k) for 
each i{t). Therefore, the probability that a randomly picked i{t) passes the verification test is 1/2 — 0{l/k). 
Thus, in expectation, the number of iterations required by the algorithm is 2k/{l — 0{l/k)). By concentration 
arguments, it follows that the probability that the algorithm does not terminate in Ak iterations is o{l/k) as k 
grows without bound. 

G. Estimation error 

Next, we bound the error in estimating x. We first find an upper bound on ||x — x^c]]]^ that holds with a high 



probability. Applying the bound in ( |TT| ), for each t = 1, 2, . . . , T, 

\x 



,(4) = O ({\ogkf\og{l/^)^{nal/k + al) 

with probability 1 — o(l/A;). Therefore, 



||x-X5|||l = \xj-Xj\+ \xj\ 

l<t<T l<t<T 
t:j{t)^Ss t:j{t)eSs 

j^Ss iG^s j€Ss 

= O (k{\ogkf\og (\l^)^{nollk + (j2)) + 5 w.p.l - o(l/A;) 

= O {k{\ogkf\og (1/e') {^Jnollk + ae) + i) (13) 

Next, note that ||z||i = X]J=i kjl ^'^^ ll^lli = Xll^i Since each Zj is a Gaussian random variable with variance 
o\. The expected value of |zj| is az^/'i/'K. Therefore, for every e' > 0, for n large enough, 

Pr(||z||i < (l/2)na,y27^) <e'. (14) 

Similarly, for m large enough, 

Pr(||e||i < (l/2)cA:cTey27^) < e'. (15) 
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Combining inequalities ( [T3] )- ( [T5] ), we have, with a high probabiUty, 

O (k{\ogkf\og{l/e') 



|X-XCe||l 




O |y|(logA:)2log(l/e')|||z||i + (logfc)2log(l/e')||e||i j +5. (16) 



Next, applying the bound in (|3]), we obtain 



x-x||i<0 A/-(logA;)^log(l/e')|||z||i + (logA;)^log(l/e')||e||i +25 



(17) 



with a high probabiUty. 



H. Proof of Theorem [2] 

Finally, to complete the proof of Theorem [2| we let 5 = min{C'(nc7j;), o(l)}. By ( fl?] ) with a high probability, 
(5 = Odlzll). Finally, recall the assumption that k = 0{'n}^^). Applying these to the bound obtained in ( [TT] ), we 
get 

p-x||i <C(||z||i + (logA:)2||e||i) 

for an appropriate constant C = C(e). 



/. Simulation Results 

This section describes simulations that use synthetic data. The /c-sparse signals used here are generated by 
randomly choosing k locations for non-zero values and setting the non-zero values to 1. The contours in each plot 
show the probability of successful reconstruction (the lighter the color, the higher the probability of reconstruction). 
The probability of error at each data point in the plots was obtained by running multiple simulations (400 in Fig [T7 



and Fig [141 and 200 in Fig 15) and noting the fraction of simulations which resulted in successful reconstruction. 
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Figure 13. Exactly sparse signal and noiseless measurements 
- reconstruction performance for fixed signal length n: The 

j/-axis denotes the number of measurements m, and the i-axis 
denotes the sparsity k, for fixed signal length n = 1000. The 
simulation results show that the number of measurements m grows 
roughly proportional to the sparsity k for a fixed probability of 
reconstruction error. Also note that there is a sharp transition 
in reconstruction performance once the number of measurements 
exceeds a linear multiple of k. The red line denotes the curve 
where the probability of successful reconstruction equals 0.98. For 
k — 150, the probability of success equals 0.98 when m — 450 
and c = m/k — 3. 
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Figure 14. Exactly sparse signal and noiseless measurements - 
reconstruction performance for fixed sparsity k: The number of 
measurements m are plotted on the j/-axis, plotted against log(n) 
on the a;-axis - the sparsity k is fixed to be 20. Note that there is 
no scaling of m with n, as guaranteed by our theoretical bounds. 
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Sparsity (k) 

Figure 15. Approximately sparse signal and noisy measurements - reconstruction performance for fixed signal-length n: As in 
Fig [13] the y-axis denotes the number of measurements m, and the i-axis denotes the sparsity k, for fixed signal length n = 1000. In this 
case, we set = 0.03, and allowed relative reconstruction error of at most 0.3. 



